Better Learning and Decoding for Syntax Based SMT Using PSDIG

نویسندگان

  • Yuan Ding
  • Martha Palmer
چکیده

As an approach to syntax based statistical machine translation (SMT), Probabilistic Synchronous Dependency Insertion Grammars (PSDIG), introduced in (Ding and Palmer, 2005), are a version of synchronous grammars defined on dependency trees. In this paper we discuss better learning and decoding algorithms for a PSDIG MT system. We introduce two new grammar learners: (1) an exhaustive learner combining different heuristics, (2) an n-gram based grammar learner. Combining the grammar rules learned from the two learners improved the performance. We introduce a better decoding algorithm which incorporates a tri-gram language model. According to the Bleu metric, the PSDIG MT system performance is significantly better than IBM Model 4, while on par with the state-of-the-art phrase based system Pharaoh (Koehn, 2004). The improved integration of syntax on both source and target languages opens door to more sophisticated SMT processes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Improving Statistical Machine Translation using Lexicalized Rule Selection

This paper proposes a novel lexicalized approach for rule selection for syntax-based statistical machine translation (SMT). We build maximum entropy (MaxEnt) models which combine rich context information for selecting translation rules during decoding. We successfully integrate the MaxEnt-based rule selection models into the state-of-the-art syntax-based SMT model. Experiments show that our lex...

متن کامل

Combining Translation Memories and Syntax-Based SMT: Experiments with Real Industrial Data

One major drawback of using Translation Memories (TMs) in phrase-based Machine Translation (MT) is that only continuous phrases are considered. In contrast, syntax-based MT allows phrasal discontinuity by learning translation rules containing non-terminals. In this paper, we combine a TM with syntax-based MT via sparse features. These features are extracted during decoding based on translation ...

متن کامل

MT based on Probabilistic Synchronous Dependency Insertion Grammar (PSDIG)

This proposal is an extension to the project “Generation in the Context of Machine Translation”, which is one of the projects done in the Johns Hopkins University Center for Speech and Language Processing 2002 summer workshop (WS02GMT). We first address some issues observed in WS02GMT which possibly leads to detrimental effects in machine translation performance. And hence we propose a new appr...

متن کامل

Syntax-based Statistical Machine Translation

In its early development, machine translation adopted rule-based approaches, which can include the use of language syntax. The late 1980s and early 1990s saw the inception of the statistical machine translation (SMT) approach, where translation models can be learned automatically from a parallel corpus rather than created manually by humans. Initial SMT models were word-based and phrase-based, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006